Large-Scale Experiments with NP Chunking of Polish
نویسندگان
چکیده
The published experiments with shallow parsing for Slavic languages are characterised with small size of the corpora used. With the publication of the National Corpus of Polish (NCP), a new opportunity was opened: to test several chunking algorithms on the 1-million token manually annotated subcorpus of the NCP. We test three Machine Learning techniques: Decision Tree induction, Memory-Based Learning and Conditional Random Fields. We also investigate the influence of tagging errors on the overall chunker performance, which happens to be
منابع مشابه
Experiments in Base-NP Chunking and Its Role in Dependency Parsing for Thai
This paper studies the role of base-NP information in dependency parsing for Thai. The baseline performance reveals that the base-NP chunking task for Thai is much more difficult than those of some languages (like English). The results show that the parsing performance can be improved (from 60.30% to 63.74%) with the use of base-NP chunk information, although the best chunker is still far from ...
متن کاملAn Empirical Study of Vietnamese Noun Phrase Chunking with Discriminative Sequence Models
This paper presents an empirical work for Vietnamese NP chunking task. We show how to build an annotation corpus of NP chunking and how discriminative sequence models are trained using the corpus. Experiment results using 5 fold cross validation test show that discriminative sequence learning are well suitable for Vietnamese chunking. In addition, by empirical experiments we show that the part ...
متن کاملFast NP Chunking Using Memory-Based Learning Techniques
In this paper we discuss the application of Memory-Based Learning (MBL) to fast NP chunking. We first discuss the application of a fast decision tree variant of MBL (IGTree) on the dataset described in (Ramshaw and Marcus, 1995), which consists of roughly 50,000 test and 200,000 train items. In a second series of experiments we used an architecture of two cascaded IGTrees. In the second level o...
متن کاملProceedings of CoNLL - 99 , Bergen , Norway pp 53 - 60 Memory � Based Shallow Parsing
We present a memory based learning MBL approach to shallow parsing in which POS tagging chunking and identi cation of syntactic relations are formulated as memory based modules The experiments reported in this paper show competitive results the F for the Wall Street Journal WSJ treebank is for NP chunking for VP chunking for subject detection and for object detection
متن کاملNP Alignment in Bilingual Corpora
We created a simple gold standard for English-Hungarian NP-level alignment, Orwell’s 1984, (since this already exists in manually verified POS-tagged format in many languages thanks to the Multex and MultexEast project) by manually verifying the automaticaly generated NP chunking (we used the yamcha, mallet and hunchunk taggers) and manually aligning the maximal NPs and PPs. The maximum NP chun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012